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Abstract 

A schema is a naturally defined subset of the space of fixed-length binary strings. The 
Holland Schema Theorem [Hol75] gives a lower bound on the expected fraction of a population 
in a schema after one generation of a simple genetic algorithm. This paper gives formulas for the 
exact expected fraction of a population in a schema after one generation of the simple genetic 
algorithm. 

Holland's schema theorem has three parts, one for selection, one for crossover, and one for 
mutation. The selection part is exact, whereas the crossover and mutation parts are approxi- 
mations. This paper shows how the crossover and mutation parts can be made exact. Holland's 
schema theorem follows naturally as a corollary. 

There is a close relationship between schemata and the representation of the population in 
the Walsh basis. This relationship is used in the derivation of the results, and can also make 
computation of the schema averages more efficient. 

This paper gives a version of the Vose infinite population model where crossover and mutation 
are separated into two functions rather than a single "mixing" function. 
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1 Introduction 



Holland's schema theorem [Hol75] has been widely used for the theoretical analysis of genetic 
algorithms. However, it has two limitations. First, it only gives information for a single generation. 
Second, it is an approximation, giving only lower bounds on the schema frequencies. This paper 
removes the second limitation. 

Michael Vose and coworkers have introduced exact models of the simple genetic algorithm. The 
Vose infinite population model exactly describes the expected behavior from one generation to the 
next. The Markov chain model is an exact model of the finite population behavior of the simple 
GA. 

Stephens ct. al. [SW97] describe these models as "fine-grained". They can be used to qualitatively 
describe certain aspects of the behavior of the i simple GA. For example, the fixed points of the 
infinite population model can be used to describe phenomena such as punctuated equilibria. (See 
[VL91] and [Vos99a] for example.) However, due to the large size of the models, it is generally 
impossible to apply these models quantitatively to practical-sized problems. 

Thus, as is pointed out in [SW97] and [SWA97], a more coarse-grained version of these models is 
needed. Models are needed that describe the behavior of a subset of of the variables included in 
the exact models. For example, a higher-level organism may have in the order of magnitude of 
100,000 genes. However, population geneticists generally do not try to model all of these; instead 
they may use 1-locus and 2-locus models. Modeling using schemata is the equivalent technique 
for string-representation genetic algorithms; they model the behavior of the GA at a subset of the 
string positions. 

In earlier work. Bridges and Goldberg [BG87] derived an exact expression for expected number of 
copies of a string under one generation of selection and one-point crossover, and they claim that 
their formulas can be extended to find the expected number of elements in a schema under the 
same conditions. Their formulas are complex and not particularly illuminating. 

As mentioned before, Stephens and coworkers ([SW97] and [SWA97]) have results similar to ours 
for one-point crossover. Our results are more general than these results in that they for general 
crossover, and they include mutation. 

[SW97] includes references to other related papers. Of particular note is [Alt95] which relates an 
exact version of the schema theorem to Price's theorem in population genetics. 

Chapter 19 of [Vos99b] (which the author had not seen when he wrote this paper) also contains a 
version of the exact schema theorem as theorem 19.2 for mixing, where mixing includes crossover and 
mutation. Theorem 19.2 assumes that mutation is independent, which is similar to the assumptions 
on mutation in this paper. 
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2 Notation 

Let n be the space of length £ binary strings, and let n = 2^. For u,v e ^l, let wSi v denote 
the bitwise- and of u and v, and let u ® v denote the bitwise-xor of u and v. Let u denote the 
ones-complement of u, and #u denote the number of ones in the binary representation of u. 

Integers in the interval [0, n) = [0, 2^) are identified with the elements of U through their binary 
representation. This correspondence allows O to be regarded as the product group 



where the group operation is ©. The elements of corresponding to the integers 2*, i = 0, — 1 
form a natural basis for U. 

We will also use column vectors of length i to represent elements of Q. Let 1 denote the vector of 
ones (or the integer 2^ — 1). Thus, u^v = #(« (8) v), and u = l®u. 

For any G let Jl^ denote the subgroup of CI generated by (2* : (g) 2' = 2'). In other 
words, i; G r^tj if and only if w n = v. For example, ii £ = 6, then Qg = {0, 1, 8, 9} = 
{000000, 000001, 001000, 001001}. 

A schema is a subset of O where some string positions are specified (fixed) and some are un- 
specified (variable). Schemata are traditionally denoted by pattern strings, where a special sym- 
bol is used to denote a unspecified bit. We use the * symbol for this purpose (Holland used 
the # symbol). Thus, the schema denoted by the pattern string 10*01* is the set of strings 
{100010, 100011, 101010, 101011, }. 

Alternatively, we can define a schema to be the set ® v, where u, v E Q, and where u v = 0. 
In this notation, u is a mask for the variable positions, and v specifies the fixed positions. For 
example, the schema Oooiooi ® 100010 would be the schema 10*01* described above. 

This definition makes it clear that a schema © v with = is a subgroup of and a schema 
r^u ® is a coset of this subgroup. 

Following standard practice, we will define the order of a schema as the number of fixed positions. 
In other words, the order of the schema © is #u (since is a mask for the fixed positions). 

A population for a genetic algorithm over length £ binary strings is usually interpreted as a multiset 
(set with repetitions) of elements of fi. A population can also be interpreted as a 2^ dimensional 
incidence vector over the index set Q: if X is a population vector, then Xi is the number of 
occurences of i € $7 in the population. A population vector can be normalized by dividing by the 
population size. For a normalized population vector x, Xj = 1. Let 



Thus a normalized population vector is an element of A. Geometrically, A is the n — 1 dimensional 
unit simplex in i?". Note that elements of A can be interpreted as probability distributions over CI. 



n = Z2X ...X Z2 
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If expr is a Boolean expression, then 

r 1 _ / 1 if expr is true 
1 if expr is false 



3 The fraction of a population in a schema 



Let X be a population (not necessarily normalized). We will be interested in the fraction of 
the elements of X that are elements of the schema $7^ © k: 

Note that here u is a mask for the fixed positions of the schema. 



If we divide the numerator and denominator of this fraction by the population size r, and if we let 
X = X/r, then we get 



(u) 

In other words, for a normalized population x, we use the notation xj^ to denote the schema 
average for the schema Q.u(B k. Note that Xq'^ = 1 since J2iGn^i — ^■ 

Let a;(") denote the vector of schema averages, where the vector is indexed over Q^- Note that 
For a fixed u, the family of schemata {Qu ®v:ve O^} is called a competing family of schemata. 



4 The Simple Genetic Algorithm 



The material in this section is mostly taken from [Vos99b] , [Vos96] , and [VW98a] . 

The simple genetic algorithm can be described through a heuristic function Q : A ^ A. As we will 
show later, Q contains all of the details of selection, crossover, and mutation. The simple genetic 
algorithm is given by: 



1 Choose a random population of size r from 

2 Express the population as an incidence vector X indexed over U. 

3 Let y = Q{X/r). (Note that X/r and y are probability distributions over O.) 

4 for k from 1 to r do 

5 Select individual i G according to the probability distribution y. 

6 Add i to the next generation population Z. 
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7 endfor 

8 Let X = Z. 

9 Go to step 3. 



It is shown in [Vos99b] that if X is a population, then y = Q{X/r) is the expected population after 
one generation of the simple genetic algorithm. Thus, the schema theorem is a statement about 
the schema averages of the population y. 

The heuristic function Q can be written as the composition of three hueristic functions C, 
and U which describe selection, crossover, and mutation respectively. In other words, Q{x) = 
U{C{F{x))) = U oC o jF{x). Later sections describe each of the three heuristic functions in more 
detail. 



5 Selection 



The selection heuristic for proportional selection is given by: 

T- / \ fk^k 

= ^ j- 

where fk denotes the fitness of A; G O. 

Let F denote the diagonal matrix over x whose diagonal entries are given by Fjj = fj. Then 
the selection heuristic can be expressed in terms of matrices by 

Ft 



l^Fx 



If X is a finite population represented as an incidence vector over Q, and if a; = X/r, then x^ 
is nonzero only for those k that are in the population X considered as a multiset. Thus, the 
computation of J-{x) is feasible in practice even for long string lengths. Further, the computation 
of the schema averages after selection can be done directly from the definition. 



Theorem 5.1 (Exact schema theorem for proportional selection.) Let x E A be a population, and 
let s = F{x). Then 

^{u) _ J2jen^fj®kXj^k 

^ J2keO. fk^k 



We give the following algorithm for computing the schema average vector s(") from a finite pop- 
ulation X. Let I{u) = {i : < i < £ and Uj = 1}, where Ui denotes bit i of u. Let P^"^ be the 
function which projects CI into Cly,: for j G CI, let P!^^\j) = ji for i G I{u). 



5 



for each /c G fi^ do 
endfor 

for each j e X do > see note below 



k ^ P^^'^j) 

l3 



endfor 

for each A; G do 
endfor 

for each /c G do 

4"^ ^ 4"V7 

endfor 
return s^,"^ 



In this algorithm, the population X is interpreted as a multiset. Thus, it is assumed that "for 
each j £ X do" means that the loop following is done once for each of the possibly multiple 
occurences of j in X. In an implementation, it would be useful to identify the elements of with 
the integers in the interval [0, 2#"), and to interpret s^"^ as a vector indexed over these integers. 

Clearly, the complexity of this algorithm is 0(2"^" + rK), where K denotes the complexity of one 
fitness evaluation. 

We now give an example which we will continue through the remaining sections. 

Let £ = 5, « = 10 = OIOIO2, r = 5, X = {6,7,10,13,21} = {00110,00111,01010,01101,10101}. 
The schema sum vector is a;(^°) =< i, |, i, ^ >. Let /e = 5, = 3, /lo = 4, /13 = 1, /21 = 7. This 
gives / = 20. The schema sum vector after selection is s^^^^ = ^ < 7, 8, 1,4 >. 



6 Holland's Schema Theorem 



We can now state Holland's Schema theorem [Hol75] . 
As in [VW98a], ior u e n, define 

hi(u) = 



lo(?x) 



max{i : 2* (g) u > 0} 
£-1 

mm{i : 2* (g) > 0} 



if u = 
otherwise 

if u = 
otherwise 
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Intuitively, the function hi{u) returns the index high-order bit of u, and lo(n) returns the index of 
the low-order bit. Let jC{u) = h.i{u) — lo{u). jC{u) is often called the defining length of u. 



Theorem 6.1 (Holland's approximate schema theorem.) Let x E A be a normalized population, 
and let y = G{x), where Q includes proportional selection, one-point crossover with crossover rate 
c, and bitwise mutation with mutation rate p. Then, 



7 The Walsh Basis 

The Walsh matrix W has dimension 2^ by 2^, and has elements defined by 



W^,,, = 2-^/2(-lf^ = -L(_i) 



T • 



Note that W is symmetric and orthogonal (Vl^VF = /). The columns of W define a basis for 
called the Walsh basis. 



As an example, for ^ = 2, 



1 

-1 
1 

-1 



1 


1 


1 


-1 


-1 


-1 


-1 


1 



If X is a vector over Q, then x = Wx can be interpreted as x written in the Walsh basis, and if M 
is a matrix over Q x then M = WMW can be interpreted as M written in the Walsh basis. 

We are also interested in vectors and matrices indexed over If a; is a vector over Q written in 
the Walsh basis, let x^."^ = 2'^^/^Xk- Theorem 7.1 will show that x'^^^ is the Walsh transform of 

We can define a Walsh matrix indexed over f2„ x For i,j G flu, define 



The following theorem shows how the schema sum vector is related to the Walsh coefficients of the 
population. 



Theorem 7.1 For any u E il. 



X 



(«) ^ |^(") 
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Proof. 



2-#«/2 ^ 



wen- ief2u 



■T 



■T 



(1) 

(2) 



To see how equation 2 follows from 1, note that X^jgfj^ (—!)■' ^ = if w G and w 7^ 0. ■ 

Theorem 7.1 shows that the schema averages of a family of competing schemata determine the Walsh 
coefficients of the population in a coordinate subspace in the Walsh basis. To be more specific, 

consider u as fixed. Then x*^"-* denotes the schema averages of the family of competing schemata 
+ k, where k varies over Theorem 7.1 shows that these schema averages determine x*^"\ 
which is a rescaling of the projection of x into the coordinate subspace generated by the elements 
of Qu- 

To continue the example started in section 5, if s^^^^ = ^ < 7, 8, 1,4 >, then s^^^^ = W^^^^s^^^^ = 
^ < 10,-2,5,1 >. 



8 Crossover 

If parent strings are crossed using a crossover mask m e CI, the children are (i(gim)©(j(gim) 

and (i (g) m) © {j 'Sim). In the simple genetic algorithm, one child is chosen randomly from the pair 
of children. 

For each binary string m & Q,, let be the probability of using m as a crossover mask. 
The crossover matrix is given by 

Ci,j = E ^"'^^^ [i © m © j ® m = 0] 
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Cij is the probability of obtaining as the result of the crossover of i and j. 

Let (Tfc be the permutation matrix with i, j'th entry given by [j(Bi = k]. Then {akx)i = xi^j-. Define 
the crossover heuristic C : A — >■ A by 

Ck{x) = {(Tkx)^CakX for A; G 

Corollary 3.3 of [VW98a] gives that the Walsh transform of the crossover matrix C is equal to C. 

Vose and Wright [VW98a] show that the kth component of C{x) with respect to the Walsh basis is 
where n = 2^. 



Theorem 8.1 (The crossover heuristic in the Walsh basis.) Let x E A and let y denote C{x) 
expressed in the Walsh basis. Then 

Y I \ 

Uk — 2 Xk®mXk®m- 

m 



Proof. 

% = \fn ^ %%®kCi,i®k 

= ^ XiXi^k ^ ^ 2 — m = 0) A {{i ® k) m = 0)] 

= Vn'^ 2 — ~ XI (8) m = 0) A ((i ® fc) (g) m = 0)] 

The condition in the square brackets can only be satisfied when i = k ®rfi, and in this case 
i ® k = {k ®m) ® k = k ® m. Thus, 



Uk = yn 7) Xk^mXkmi 



2 

m 



Theorem 8.2 (The crossover heuristic for schema in the Walsh basis.) Let a; G A and let y denote 
C{x) expressed in the Walsh basis. Then 



2 fei8)m -^feigirn 
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Proof. 



2 

m 



Consider the exponents: 



Thus, 



-#(u(g)m)/2-#(u(g)m)/2 = -^/2 + #(u (g) m)/2 - £/2 + #(u (g) m)/2 



?7<") _ 9#u+#u/29-^+#u/2 + ;-;(w(g)m) ^(«(g)m) 



2 

m 

m 



To continue the numerical example, suppose that 1-point crossover with crossover rate 1/2 is 
applied to the I = b population for which s^^°) = ^ < 10)~2, 5, 1 >. We want to compute 

for = 0, 2, 8, 10. For A: = 0, 2, 8, for every crossover mask m, either A;(8)m = 0orA;(8)m = 0, so 

^7(10) _ Ak)Aimk) _ AW) 
yk ~ ^k ^0 ~ ^k ■ 

For k = 10, there are four possible nontrivial crossover masks, each with probability 1/8. For two 
of these, k^m^Q and A; (g) m 7^ 0. This gives 



^10) _ 3^10) 1^2)^2) _ 3^10) 

yio ~ 4*10 "•"4*2 *8 "4*10 
Thus, y(io) = ^<20,-4,10,l> 



+ 1 (V24-)) (V2jn = ^.l + i.2.Z^.l = 1-1 = 1 
^ A\ ^ ) \ ^ ) 4 20 4 20 20 80 80 40 



The following theorem gives a simple formula for the exact change in the expected schema averages 
after crossover. It is a restatment of theorem ?? of [SW97] and theorem ?? of [SWA97]. It can also 
be easily derived from theorem 19.2 of [Vos99b] by setting the mutation rate to be zero. 



Theorem 8.3 (Exact schema theorem for crossover.) Let x he a population, and let y = C{x). 
Then 

ilk ~ 2 k®m -^kmi \'^) 
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Proof. 

,{«) 



m 

E-^m ~l" '^m (u®m) (u®m) 
9 ■^k®m ■^k®m 



Theorems 8.1, 8.2, and 8.3 show that the effect of crossover using mask ra is to move the population 
towards hnkage equihbrium relative to m. Following the population biologists (see [CK70] for 
example), we define the population x to be in linkage equilibrium relative to mask m if = 
Xk^mXk^m, or equivalently if x^""-* = a;[,^^-'a;^'^^^ for all k e Q. If a population is in hnkage 
equilibrium with respect to all masks of a family of crossover masks that separates any pair of bit 
positions, then the population will be completely determined by the order 1 schemata averages (or 
equivalently the Walsh coefficients Xk with #k = 1). This is formalized in theorem 3.0 of [VW98b] 
(Geiringer's theorem). 

Continuing the numerical example, suppose that one-point crossover with crossover rate 1/2 is 
applied to the £ = 5 population whose schema averages for u = 10 are given by s^^^^ = ^ < 
7,8,1,4 >. To apply theorem 8.3, wc need s*^^^ and s^^\ These are easily obtained from s^^^^: 

J2) _ JIO) , (10) _ 2 „(2) _ (10) (10) _ 3 „(8) _ (10) (10) _ 3 „(8) _ (10) (10) _ 1 
*0 ~ ''O ^ *8 ~ 5' *2 ~ *2 ^^10 ~ 5' *0 ~ *0 ^ *2 ~ 4 ' *8 ~ *8 *10 ~4" 

Let y = C{s). As before, the probability of a crossover mask for which « (g) m 7^ and tt ® m / is 
1/4. Thus, 

(10) _ 3 (10) 1 (2) (8) 

yo ~ ^*o "T *o 



(10) _ L(10)^L(2).(8) _ 

4 



3 

~ 4 ' 


7 1 

20 4 ' 


2 
' 5 ' 


3 

' 4 ~ 


27 

80 




3 


8 1 

20 ^ 4 ' 


3 


3 


33 




~ 4 ' 


' 5 ' 


' 4 ~ 


80 




3 


1 1 

20 4 ' 


2 


1 


5 


1 


~ 4 ' 


' 5 ' 


' 4 ~ 


80 ~ 


16 


3 


4 1 
20 ^ 4 ■ 


3 


1 


15 


3 


~ 4 ' 


' 5 ■ 


' 4 ~ 


80 ~ 


16 



^2 "42 ^ ^*2 *0 

,,(10) _ 3 (10) 1 (2) (8) 
^8 ~ 4*8 4*0 *8 

(10) _ 3 (10) 1 (2) (8) 

yio ~ 4*10 ^ 4*2 *8 
One can check that 

y(10) is 

the Walsh transform of computed earlier. 

Corollary 8.4 (Approximate schema theorem for crossover.) Let x be a population, and let y 
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C{x). Then 

(u) ^ (u) \ ^ + -^m r/ ^ \ \ / Z ^ — M 

Vk d. 2^ 2 L(^t <^ = V (n (8) m = u)\ 

m 



Note that the summation over m includes just those crossover masks that do not "spht" the mask 
u. 

Proof. For u such that u <Si m = u, we have: 

u®m = and 



(«®m) _ (0) 
(u(g)m) _ (u) 

^U^rm ^k 



and 



Similarly, for u such that u <Si m = u, we have: 

u®m = and 
(«®m) _ (0) _ ^ , 

Those terms in the summation of equation (3) for which {u ® m = u) \/ {u ®rn = u) is not true 
are nonnegative. Thus, if we drop those terms from the summation, we get the equation of the 
corollary. ■ 



Corollary 8.5 (Holland's approximate schema theorem for 1-point crossover.) Let x be a popu- 
lation, and let y = C{x), where C is defined through 1-point crossover with a crossover rate of c. 
Then 

»r>4"'(i-cf^) 

Proof. One-point crossover can be defined using i crossover masks with a nonzero probability. The 
crossover mask has probability 1 — c, and the masks of the form 2* — 1, z = l,...,^— 1 have 
probability c/(£ — 1). The number of crossover masks such that {u m = u) V (u <Si ffi = u) is not 
true is jC{u). Thus, the probability that {u m = u) V {u fn = u) is true is 




It is not hard to give similar approximate schema theorems for other forms of crossover, such as 
two-point crossover and uniform corssover. 



9 Mutation 



In the Vosc model, mutation is defined by means of mutation masks. If j G fl, then the result of 
mutating j using a mutation mask m G is j ® m. The mutation heuristic is defined by giving 
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a probability distribution ^ £ A over mutation masks. In other words, fim is the probabihty that 
m G i7 is used. Given a population x G A, the mutation heuristic : A — >^ A is defined by 

The mutation heuristic is a linear operator: it can be defined as multiplication by the matrix U, 
where Uj^k = IJ'o®k- In other words, U{x) = Ux. 

In the Walsh basis, the mutation heuristic is represented by a diagonal matrix. 
Lemma 9.1 The kth component of the mutation heuristic in the Walsh basis is given by 

where n = 2^. 

Proof. It is sufficient to show that the Walsh transform [/ of C/ is diagonal since 

= wUx = {WUW){Wx) = Ux 
The following shows that U is diagonal. 

v,w 

n ^-^ ^-^ 

V w 

We now do a change of variable. Let u = v ® w, which implies that w = v ® u. 

V u 

= -EE(-i)^'®'^^'^''^>- 

^ V u 

= ^E(-i)'">«E(-i)^'®'^"^ 

U V 

u 

■ 

Define n'j^^ = J2jen^l^k®j- Theorem 7.1 shows that = where Jlk = 2*'^/^Jlk for all 

Define the 2#" x 2#" matrix by C/j'^^ = ^J.^^^. Note that U = U^^l The proof of lemma 
9.1 shows that the Walsh transform [7(^) of U^"^^ is diagonal and U^'^'>k,k = 2*"/^/2fc. Thus, it is 
consisitent to write 

[/{«) for C/W. 
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We now assume that each string position i, i = 0, 1, — 1, is mutated independently of other 
positions: with a probabihty of pi, the bit at position i is flipped. If all of the pi are equal to a 
common value p, then p is called the mutation rate. 

Under this assumption, the probability distribution for mutation masks is given by 

l^m = lipril-Pi)'-'^' (4) 

1=0 

where rrii denotes bit i of m, and where 0° is interpreted to be 1. For example, the distribtuion for 
^ = 2 is the vector 

< (l-Po)(l-Pi) Po(l-Pi) (l-Po)pi PoPi >^ 



We now want to show that there is an equation similar to (4) for /Hm • The next lemma is a step 
in that direction. For G O, define I{u) = {i : <i < £ and Uj = 1}. 



Lemma 9.2 For v E CI, 



Proof. The proof is by induction on ^u. 

If #n = 1, then u = 2^ for some j, and I{u) = j. Also, i^u = {0,ti}. Thus, the left side of equation 
(5) is pO(l - pj)^ +p}{l - pjf = (1 - Pj) +pj = 1. 

If i^u > 1, let u = V ® w with v i^i w = 0, #v > 0, i^w > 0. Then 

E n p^'^-Pi)'-'- = E E (up'-i^-Piy-'] i n p/(^-pj)'- 

ke^u iei{u) keciv reciw \iei{v) / \jei{w) 

= (e ]ip''(^-p^)'Aii: u pI'('-p^)'' 

\keciv iei{v) J \reciw jei{w) 

= 1 



Lemma 9.3 For u £ and m G i^u, 

f^^r:i>= n Pn^-Pi)'-"^' (6) 
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Proof. 



Do a change of variable: let = u (m (g) u). Then 

= 1 



The next step is to compute the Walsh transform of the mutation probability distribution under this 
assumption. It is helpful to do a change of coordinates. For each i = 0, 1, . . . — 1, let = 1 — 2pj. 
Under this change of coordinates, equation (6) is equivalent to 

^H = 2-#" n (l + (l-2m,)%) 
iei{u) 



Lemma 9.4 For m G O^, 

ie/(m) 



Proof. The proof is by induction on #n. For the base case, assume that 4j^u = 1. Then n = 2' for 
some i, and 



1 


" 1 


1 


1 


" 1 + " 




1 


-1 


2 




1 


1 








71 


. . 









For > 1, we have 
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Let u = V (Bw where v <Si w = 0, v ^ 0, and w ^ 0. 

2-l*(v®w) JJ (1 + (1 



Mm 



2-f#-^(-l 



n (1 + (1 - 2j,)g,) 



2-^*" E (-1)^"^"^"' n (i+(i-2iik) 



= 2 



n * 

i£/(mCg)D) 

2-#"/^ n 1^ 



iel{m) 



-#w/2 



n 



Lemma 9.5 



Mr; 



Proof. 



Mn 



iel{m) 



Theorem 9.6 (The mutation hueristic in the Walsh Basis.) Let x e A be a population, and let 
y = U{x). If k E then 

iel{k) 



Proof. 



Vk 



= 2*^/^2^/'^JlkXk by lemma 9.1 



2#n/2 2-#"/2 -Q q. 



I ^k 



(u) 



iel(k) 



n 

ie/(fc) 
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Theorem 9.6 shows how mutation affects a population. If A; 7^ 0, and if for every i, < Pi < 1/2, 
then Y\iQi(^k)1i < ^- Thus, \y^^\ < \x^k^\. Mutation is decreasing the magnitude of the schema 
Walsh coefficients (except for the index coefficient which is constant at 2~#"/^). If all of these 
Walsh coefficients were zero, then Theorem 7.1 shows that all of the corresponding schema averages 
would be equal. In other words, mutation drives the population towards uniformity. 

To continue the numerical example, we take apply mutation with a mutation rate of 1/8 to the 
population y of the previous section. We start with = ^ < 20, —4, 10, 1 >. Let z = U{y). For 
sl\i,q = qi = l-2pi = l- 1/4 = 3/4. Thus, 
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Lemma 9.7 For u G 

Proof. This is just a rewriting of the equation of theorem 9.6 into matrix form. ■ 

The following theorem can be easily derived from theorem 19.2 of [Vos99b] by setting the crossover 
rate to be zero. 



Theorem 9.8 (The exact schema theorem for mutation.) Let x & A be a population, and let 
y = U{x) where U corresponds to mutating hit i with probability Pi for i = 0,1, . . . £ — 1. Then 



y 



(n) ^ U^n)^{u) 



Proof. 



17 



We continue the numerical example. We start with the schema averages computed in the crossover 
section: ?/(^°) ~ M ^ 27,33,5,15 > and let z = U{y) where U corresponds to mutation with a 
mutation rate of 1/8. Recall that [i^^^ is given by equation (6), so 



(10) 



=< (1 - p)^p(l - v\ (1 - >= ^ < 49, 7, 7, 1 > 



64 



The entries of C/^"^ are given by u'^^^ = /Ji^^j^, so 
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Corollary 9.9 (The approximate schema theorem, for m,uta,tion.) Let x £ A be a normalized 
population, and let y = U{x). Assume that U corresponds to mutation where each hit is mutated 
(flipped) with probability p. Then 

y(")>(l-p)#«xl") 



Proof. The diagonal entries of f/^"^ are all equal to /Iq = ~Pi)- Under the assumption 

of this corollary, Pi = p for all i, so the diagonal entries of U^'^^ are all equal to (1 — p)^". The off- 
diagonal entries of U^'"^ are all nonnegative. If we drop the off-diagonal entries in the computation 
of equation (7), we get the result of this corollary. ■ 



10 Computational Complexity 



In this section we give the computational complexity of computing the schema averages for a family 
of competing schema averages after one generation of the simple GA. 

It is more efficient to compute the schema averages after selection using the normal basis using 
the algorithm given in section 5, convert to the Walsh basis using the Fast Walsh transform (see 
Appendix A), compute the effects of crossover and mutation in the Walsh basis, and convert back to 
normal coordinates using the fast Walsh transform. To convert from x*-"-* to x^"^ by the fast Walsh 
transform has complexity &{#u ■ 2#") ([Vos99b]). The complexity of the computation of theorem 
8.2 is Q{^u ■ 2*") for one or two point crossover (since the summation over m is Q{^u)). The 
complexity of the computation of theorem 9.6 is also @{^u ■ 2*"). Thus, the overall computational 
complexity (assuming an initial finite population and one or two point crossover) is 
where K was defined as the cost of doing one function evaluation. Note that the only dependence 
on the string length is through K. Thus, it is possible to compute schema averages exactly for very 
long string lengths. 
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11 Conclusion 



We have given a version of the Vose infinite population model where the crossover heuristic function 
and the mutation heuristic function are separate functions, rather than combined into a single 
mixing heuristic function. 

We have shown how the expected behavior of a simple genetic algorithm relative to a family of 
competing schemata can be computed exactly over one generation. 

As was mentioned in section 7, these schema averages over a family of competing schemata corre- 
spond to a coordinate subspace of A as expressed in the Walsh basis. In [VW98a], it was shown that 
the mixing (crossover and mutation) hcTiristic is invariant over coordinate subspaccs in the Walsh 
basis. We have explicitly shown how the Vose infinite population model (the heuristic function 
Q) can be computed on these subspaces. In fact, the model works in essentially the same way on 
schema averages as it does on individual strings. 

The formulas are simply stated and easy to understand. They are computationally feasible to apply 
even for very long string lengths if the order of the family of competing schemata is small. (The 
formulas are exponential in the order of the schemata.) 

A result like the exact schema theorem is most useful if it can be applied over multiple generations. 
The results of this paper show that the obstacle to doing this is selection, rather than crossover 
and mutation. The result of the exact schema theorem is the exact schema averages of the family 
of competing schemata (or the corresponding Walsh coefficients) after one generation. These cor- 
respond to an "infinite population" which has nonzero components over all elements of Q. If the 
string length is long and no assumptions are made about the fitness function, then the effect of 
selection on the schema averages for the next generation will be computationally infeasible to com- 
pute. Thus, in order to apply the exact schema theorem over multiple generations for practically 
realistic string lengths, one will have to make assumptions about the fitness function. A subsequent 
paper will explore this problem. 

Acknowledgements: The author would like to thank Yong Zhao, who proofread a version of 
this paper. 
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Appendix: Table of Notation 

[e] = 1 if e is true, if e is false 

i The string length 

c The arity of the alphabet used in the string representation 

The set of binary strings of length H. 
n = c^) the number of elements of 

r The population size 

u®v The strings j and k are bitwise added mod 2, (or bitwise XORcd) 
u<Siv The strings j and k are bitwise multiplied mod 2, (or bitwise ANDed) 
u The ones complement of the string j 

The number of ones in the binary string u 
k^j The same as #{k'Si j), the number of ones in A; (g) j 

A The set of nonnegative real-valued vectors indexed over O whose sum is 1 

= the set of normalized populations 

= the set of probability distributions over i7 

= {k £ Q : u k = k} 

© I' = {j ® V : j £ Qu} = the schema with fixed positions masked by u and specified by v 

xi""^ = J2jen-^j®^ (assuming that xj = 1). 

The schema average or sum for the schema 0,^ © v 

The vector of schema averages for the family of schemat {Jlu ® v : v £ $7^} 

1 T • 

W The Walsh transform matrix, indexed over 

The Walsh transform matrix, indexed over fi^ x Q,u. wj;j^ = 2-#"/2(-l)»^.? 
X = Wx, the Walsh transform of normalized population x 

= 2#"/^3, also the Walsh transform M^(")a;*^") of x^'^^ with respect to r2„ 

The probability that m G O is used as a crossover mask 
/j^m The probability that m G O is used as a mutation mask 

Pi The probability that bit i is flipped in the mutation step 

Qi = 1 - 

U The matrix indexed over O x $7 and defined by Uj^k = Mjefc 

The matrix indexed over x ri„ and defined by U^^^ = H^^/^ 
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